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ABSTBACT 


Markcv  chains  with  large  transition  probability  matrices 
occur  in  many  applications  such  as  manpower  models.  Onder 
certair  conditions  the  state  space  of  a  stationary  discrete 
parameter  finite  Markov  chain  may  be  partitior.ed  into 
subsets,  each  of  which  may  be  treated  as  a  single  state  of  a 
smaller  chain  that  retains  the  Markov  property.  Such  a  chain 
is  said  to  te  "lumpatle"  and  the  resulting  lumped  chain  is  a 
special  case  of  more  general  functions  of  Markov  chains. 

There  are  several  reasons  why  one  might  wish  to  lump. 
First,  there  may  be  acalytical  benefits,  including  relative 
simplicity  of  the  reduced  model  and  development  of  a  new 
model  which  inherits  known  or  assumed  strong  properties  of 
tne  original  model  (the  Markov  property) .  Second,  there  may 
te  statistical  benefits,  such  as  increased  robustness  of  the 
smaller  chain  as  well  as  improved  estimates  of  transition 
protafcilities.  Finally,  the  identification  of  lumps  may 
provide  new  insights  about  the  process  under  investigation. 

However,  a  problem  that  arises  in  connection  with  prac- 
tical applications  cf  Markov  chain  models  is  to  determine 
whether  the  chain  is  lumpable.  This  is  especially  difficult 
when  the  matrix  P  =  [p.^.]  of  transition  probabilities  is 
estimated  from  transition  data.  In  this  case,  it  is  desir- 
able to  find  bounds  cc  Zir  the  largest  error,  ^^.  -  p-.  ,  in 
estimating  p--  ,  for  all  i  and  j. 

This  thesis  exanines  the  sensitivity  of  the  lumping 
conditions  based  on  E,  the  estimate  of  ?.  In  general,  it  is 
found  that  the  classical  lumping  conditions  are  extremely 
sensitive  to  the  estimation  error  which  can  be  expected  to 
cccur  even  with  large  data  ^ts.  Thus,  these  conditions  may 
te  of  linited  value  ir  many   ctuai  applications. 
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I.  INTBODDCTION 

Karkcv  chains  with  large  transition  probability  matrices 
occur  in  many  applications  such  as  manpower  models.  Under 
certain  conditions  the  state  space  of  a  stationary  discrete 
parameter  finite  Markov  chain  may  be  partitioied  into 
subsets,  each  of  which  may  be  treated  as  a  single  state  of  a 
smaller  chain  that  retains  the  Markov  property.  Such  a  chain 
is  said  to  be  "lumpable"  and  the  resulting  lumped  chain  is  a 
special  case  of  more  general  functions  of  Markov  chains. 

Consider  a  Markov  chain  {X:t  =  0,1,2,...}  with  finite 
state  space  S  =  {1,2,...,n}/  stationary  transition  prob- 
ability  matrix  P   =   (p-) ,   and   a   priori  distribution  of 

"initial  states",   po  =  (pO,p^o, 'Pn°)  •     ^^^   ^  denote   a 

nontrivial  partitior  of  S  into  m  <  n  "lumps",  say 
5*=  {I  (1)  ,L  (2)  ,- ..,L  (m)  }  .  If  {X^}  is  lumpable  with  respect 
to  "s",  denote  by  {^}  the  lumped  chain  with  state  space  T  and 
transition  probability  matrix  P". 

A  well-known  characterization  [ Ref -  2]  is  that  {X^}  is 
lumpable  to  [XT}  if  aid  only  if  there  exist  matrices  A  and  B 
such  that 


BAPE  =  PB  (1 .1) 

where  B  consists  of  m  nonzero  orthogonal  n-dimensional 
column  vectors  whose  components  are  zeros  or  ones,  and  A  is 
E'  with  rows  normalized  to  probability  vectors  (i.e, 
A  =  (E*BriB»)«  The  positions  cf  the  1's  in  each  column  of  B 
correspond  to  states  in  S  that  together  form  a  lump  in  "§", 
It  follows  that  if  BAPB  =  PB  is  satisfied,  then  p"  =  AP3  as 
is  shewn  in  Chapter  2. 


Many  of  the  mathematical  *juantities  associated  with  {X^} 
can  te  transformed  directly  to  corresponding  quantities  for 
[xT} /  using  the  lumping  matrix  B.  In  Chapter  2,  for  example, 
we  show  that  if  an  original  Markov  chain  (X^}  is  lumpable  to 
{X^}  and  {X*^}  is  further  lumpable  to  {X]^},  then  {X^}  is 
directly  lumpable  to  {X^^}  #  and  we  give  the  lumping  matrix 
for  {iT^}  in  terms  of  the  underlying  two  lumpings. 

There  are  several  reasons  why  one  might  wish  to  lump 
[Eef-  1].  First,  there  may  be  analytical  benefits, 
including  relative  simplicity  of  the  reduced  model  and 
development  of  a  new  model  which  inherits  known  or  assumed 
strong  properties  of  the  original  model  (the  Markov  frcp- 
erty)  .  Second,  there  may  be  statistical  benefits,  such  as 
increased  robustness  of  the  smaller  chain  as  well  as 
improved  estimates  cf  transition  probabilities.  Fitally, 
the  identification  of  lumps  may  provide  new  insights  about 
the  process  under  investigation. 

However,  a  problem  that  arises  in  connection  with  prac- 
tical applications  of  Markov  chain  models  is  to  determine 
whether  the  chain  is  lumpable.  For  chains  with  large  state 
spaces  S,  it  is  practically  impossible  to  use  an  exhaustive 
search  to  determine  whether  lumpability  conditions  such  as 
those  given  in  equation  (1.1)  are  met  for  some  matrices  B, 
because  of  the  large  number  of  ways  partitioning  S,  i.e,  the 
large  number  of  candidate  B  matrices.  For  example,  if  S  has 
10  elements,  there  are  115,975  partitions  of  S. 

Another  problem  is  to  estimate  the  matrix  P  =  {p..}  of 
transition  probabilities  and  to  find  bounds  on  A.,  the 
largest  error  of  ^^>-  f-  for  all  i  and  j.  We  shall  investi- 
gate the  sensitivity  of  the  lumping  conditions  in  equation 
(1.1)  for  varying  A.  If  {X^}  is  lumpable  with  luirping 
matrix  B,  is  condition  (1. 1)  satisfied  with  P  replaced  by 
the  estimate  P? 


This  thesis  will  attempt  to  examine  the  sensitivity  oi 
the  lumping  conditions  based  on  reasonable  estimation  errors 
^  when  P  is  not  kncwn  and  must  use  estimated  by  'p.  We 
describe  these  facts  about  lumpability  using  eigenvalues  and 
eigenvectors,  including  the  theorem  mentioned  by  D.R.Barr 
and  M.D.Thomas  [Eef.  3].  We  do  not  review  elementary 
concepts  of  i^arkov  chains  here;  the  reader  may  wish  to 
consult  [Eef.  2]  and  £Ref.  4]  for  review  of  basic  facts  and 
specific  terminologies  such  as  lumpability,  regular  Markov 
chain,  etc. 
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II-  IHEOBY  OF  LO MP ABILITY 

This  chapter  will  cover  general  facts  about  lumping  such 
as,  conditions  for  lumping,  the  number  of  partiticns 
possible  for  any  given  size  of  state  space  S,  and  theorems 
associated  with  eigenvector  conditions  for  Markov  chain 
lumpability. 

A.   CCNDITIONS  FOR  LDBPIHG 

Consider  a  Markov  chain  {X:t  =  0,1,2,...}  with  finite 
state  space  S  =  n,2,-..,n),  stationary  transition  prob- 
ability matrix  P  =  {p-},  and  a  priori  distribution  of 
"initial  states",  po  =  (p,°  #  p° /.  .  - /P^'^)  .  let  T  denote  a 
nontrivial  partition  of  S  into  m  <  n  "lumps",  that  is 
T  =  {1(1),  L{2),  ...,  L(m)}.  If  {X^}  is  lumpable  with 
respect  to  's',  denote  by  {X^}  the  lumped  chain  with  state 
space  T  and  transition  probability  matrix  T. 

TJe  now  show  that  if  the  condition  (1.1)  for  lumpability 
with  respect  to  the  lumping  matrix  B, 


BAPB  =  PB  (2.1) 

is  satisfied,   then  the  lumped   transition  matrix  T  is  given 

ty 


P  =  APB  (2.2) 


Proof,  ^j..  is  the  sum  ^   p,^  ,  where  L  (j)  is  the  partition 
subset  ccntaming  j  e  S   and  i  is  any  element   of  L  (i)  .    By 
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the  lumpabitilitY  condition,  this  value  is  the  same  for  any 
i  6  L  (i)  .  But  the  product  PB  sums  the  columns  of  P  in  accor- 
dance with  the  partition  subsets  indicated  by  the  columns  of 
E.  Hence,  PB  is  an  n  x  m  matrix  with  rows  repeated  in  accor- 
dance with  the  partition  sets  1(1)/  L  {2) ,  ...  ,  L  (m)  ;  the 
effect  of  pre-multiplying  by  A  =  (B'B)-iB'  is  to  "average" 
these  cottEcn  rows  yielding  an  m  x  m  matrix  P*  without  the 
repeated  rows.  But  such  "averages"  are  just  the  common  rows 
being  averaged.  Hence,  T  =  APB  is  the  m  x  m  transition 
matrix    of    the    lumped     chain    with    state    space 

{L{^)  ,1(2)  ,.,.,L(ffi)} . 

Example  1.  Consider  a  transition  probability  matrix  P 
with  4  states  which  can  be  partitioned  into 
3*=  {{1},{2,3},  {4}}  =  [1(1), 1(2)  ,1(3)}.  Let 


P  = 


ri/4  1/16  3/15  1/21 

0  1/12  1/12  5/6 

0  1/12  1/12  5/6 

L7/8  1/32  3/32  0  J 


Ihen 


B  = 


1 

0 

c 

0 

1 

0 

0 

1 

c 

0 

0 

1. 

and 


A  = 


10    0    0' 
0   0.5  0.5  0 
LO   0    0    1 


^e   know   e«^uation   (2.  1)   is   satisfied   with   partitioning 
T =  {1  (1) ,1  (2) ,L  (3) }  .   Thus,  the  lumped  transition  matrix  is 


APB  =  P  = 


0.25    0.25      0.5 
0       0.167     0.833 
L0.875   0.125     0 


Many  of  the  mathematical  quantities  associated  with  [X,} 
can  be  transformed  directly  to  corresponding   quantities  for 
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{xT}  ,  using  A  and  B  cf  equation  (2.1),  For  example,  since 
AB  is  the  m-dimensional  identity  matrix,  it  follows  that  for 
s   a    positive  integer. 


(?)     =     (APBf    =    A(PE)A  (PB)  .  ..A  (PB)     =    AP^B  (2.3) 

We  now  show  that  (pf  =  AP^B 

(P)^      =  (APB)  (APB)  (APB)     .     .     .      (APB) 

=  AE(BAPB)  (APB)     .     .     .      (APB) 

=  aP  (PB)   (APE)     .     .     .      (APB) 

=  AP2  (3APB)  .     .    .      (APB) 

=  APZpB.     .     .     (APB) 


=    iiP^B    . 


But    Ar^B   =   P^,    since 

BAPB    =    PE 
EBAPB    =    P2B 
BilPBAPB   =    P2B 
EAP2B    =    P2E 

EflP^B    =    P^B    , 

so  P^  is  lumpable  with  the  same  matrix  B  and  P^  =  AP^ B . 
This  implies  in  turn  that  if  {X^}  has  steady  state  distribu- 
tion IT,  then  {X^}  has  steady  state  distribution  IT  =  IfB. 

Theorem   1.   The   steady  state   distribution  IT   of   the 
lumped  chain  {X^}  is  TTB  where  IT  =  IT?. 

Proof.  TTB  =  (TTP)  B 

=  TTB  (APB) 
=  (TTB)?' 

Therefore,  TT  =  TTB. 
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similarly, the  a  priori  distribution   po  of  the  initial 
state  of  the  lumped  chain  corresponding  to  that  of  the  orig- 
inal chain   pO ,   is  given  by   pO  =  pOB,   since  by  equations 
(2.  1)  and  (2.3)  , 

pOp^B  =  pop  ...  PB 

=  pOP  . . .  PBAPB 
=  pO?  .  .  .  PB?* 

=  pOBP^. 

Note  that  pOP  B  is  the  distribution  of  lumped  states  occu- 
pied by  the  lumped  chain  after  s  transitions.  Since  this 
equals  pOEP^=  pop^,  it  follows  that  po  =  pOB. 


B.   PARTITIONS  OF  A  SIT  OF  STATES 

The  matrix  B  consists  of  m  nonzero  orthogonal 
n-dimensional  column  vectors  whose  components  are  zeros  and 
ones  which  determine  a  specific  partition  of 
S  =  {1,2,. ..,n}.  Example  1  illustrates  this,  where  the 
state  space  S  =  {1,2,3,4}  is  partitioned  into 

S"  ={{1},  {2,3},  {4})  =  {L(1),L(2)  ,L(3)}  ,  and 

1  0  0- 

B  =   0   1   0 
0   1   0 
LO   0   1i  . 

Permutations  of  these  columns  give  a  matrix  which  also 
lumps  {X^} .  In  order  to  see  this,  let  B^  be  B  with  columns 
permuted  in  some  order.  Then  B^  =  B-I^,  where  I  is  the  iden- 
tity matrix  with  its  columns  permuted  in  the  same  order.  Now 
if  BAPB  =  PE,  then 
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B^A^PB"^  =  B^(B^E'*^riB^PB"^ 

=  B-I*  ( I^B '  B I V*  I^ B  »  PB I^ 

=  B-I^  ( I^r*  (B  •  B )-i  (I^')-i  I"*^  B »  P B l"^ 

=  B(B»B-i)  B'PBI* 

=  EAPBI"* 

=  PBI^ 

=  PB^  , 

so  it  follovs  that  {X^}  is  also  lumpable  with  respect  to  the 
matrix  B^. 

Now,  how  many  candidate  lumping  matrices  are  there?  This 
would  be  the  number  of  partitions  of  S.  [Ref.  5]  gives  a 
recursion  relation  for  the  number  A^  of  ways  of  partitioning 
a  set  S  =  {1,2  , ,N} : 


A^   =   ^  I    1  Ay,  (N  >  1  ,  A^  =  1)        (2.4) 

Irom  this  relation  we  find  A,=  1,  A^  =  2,  Aj=  5,  A .  =  15,  etc. 
The  sizes  of  the  entries  in  Table  1  show  that  it  would  be 
impossible  to  use  a  trial  and  error  approach  to  finding 
lumping  natrices  B  for  lumping  a  chain  with  larger  state 
spaces,  say  with  10  or  more  elements.  Values  of  A^  for 
larger  N  are  shown  in  Table  1. 
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N 

5 

6 
7 
8 
9 
10 


TABLE  1 
Partiticns  of  a  Set  of  N  States 


Partiticns 


N 


52 

20 

203 

30 

877 

40 

4140 

50 

21147 

60 

115975 

70 

Partitions 

5.172415            X  1013 

8.467490145    x  1023 

1.574505884    x  1035 

1.857242688    x  10*^ 

9.769393075    x  lOs^ 

1.80750039      X  10"'3 


It  is  of  interest  to  be  able  to  systematically  prescribe 
alternative  lumpings  by  generating  matrices  B  for  a  given 
transition  matrix  P,  using  some  method  other  than  trial  and 
error.  In  the  next  section,  we  describe  an  approach  to 
finding  E  matrices  using  the  eigenvalues  and  eigenvectors  of 
F. 

C.       AN    EIGENVECTOR    CCHDITION    FOB    MARKOY    CHAIN   LDHPABIIITY 

Many  problems  in  science  and  mathematics  deal  with  a 
linear  operator  T  :  V — >V,  and  it  is  of  importance  to  deter- 
mine these  scalars  for  which  the  equation  Tx  =  ^x  has 
nonzerc      solutions    x.  In      this      section    we      discuss      this 

problem   and   its   relationship    with    finding    matrices    B. 

Theorem  2.  The  value  1  is  always  an  eigenvalue  for  any 
Markov   chain   transition   probability    matrix. 

Proof.  Let  ?  be  any  n  x  n  transition  probability  matrix 
of  {X.},  X  be  a  left  eigenvector  in  r"*  ,  and  A  be  the  corre- 
sponding eigenvalue  cl  P.  Then  xP  =  xA  which  is  equivalent 
to 
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X  (P  -  Ai)  =  0 


(2.5) 


For  A  to  he  an  eigervalue,  there  must  be  a  nonzero  solution 
X  of  equation  (2-5).  Equation  (2.5)  will  have  a  nonzero 
solution  if  and  only  if 


d€t  (P  -  P\l)    =  0  . 


(2.6) 


This  is  called  the  characteristic   equation.   To   show  that 
A  =  1  always   satisfies  equation  (2.6),    we  need   only  show 
that   the   columns  of   the   matrix   in  equation   (2.6)    are 
linearly  dependent. Note  that 


(P  -  I)  = 


P   E 
P  -1 


D 
P 

P. 


1 

0 


p  • 

p  -1 


p 

p 


(2.7) 


LP     P P  -1  J 

n 

Since  L.  Pj:  =  1  for  Markov  chains,  it  follows  that  the  rows 
in  equation  (2.7)  sua  to  zero,  so  the  determinant  in  equa- 
tion (2.6)  is  zero  with  A  =  1.  It  follows  that  A  =  1  is  an 
eigenvalue  of  the  MarXov  chain  (X^) .  We'd  next  like  to  see 
properties  of  eigenvectors  corresponding  to  the  eigenvalue 
A=  1. 

Theorem  3.  For  any  regular  Markov  chain,  components  of 
the  eigenvector  corresponding  to  A  =  1  are  proportional  to 
the  steady  state  distribution  of  {X.  } . 
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Eroof.  Let  x  be  a  left  eigenvector  of  P,  and  A  b€  the 
corresponding  eigenvalue  of  P,  such  that  xP  =  xA /  and 
assume  L  x^  =  1.  For  given  ^A  =  1,  xP  =  x.  The  steady  state 
distribution  of  {X^}  is  unique  [Eef.  4].  Therefore,  x  must 
te  the  steady  state  distribution  71"  since  )_  x^=  1. 
The  following  exaiple  demonstrates  Theorem  3. 

Example  2.  Let 

"  1/4  1/16  3/16  1/2 

P  =    0  1/12  1/12  5/6 

0  1/12  1/12  5/6 

. 7/8  1/32  3/32    0  J  . 

The  eigenvectors  corresponding   to  the  eigenvalues  of   E  are 
displayed  as  column  vectors  below; 


Eigenvalues 
Eigenvectors 


1 

C.7367 
0.09209 
C.2236 
L  0.6315 


-0.7201 
0.7291 
0 


-0.25 
10.  5 
-0.375 
-4. 125 
-6 


-0.3333 
0.8247 
-0.03436 
-0.2405 
-0.5498  , 


Note  that   ir  =  (  IT,   /  T^      ,        IT^  /   IT^  ) 
=  (0.4375,  0.0547,  0.1328,  0.375), 


where 


•\[   =   : : 


etc. 


Theorem  4.  Eigervectors  corresponding  to  eigenvalues 
ether  than  1  are  orthogonal  to  e  =  (1,1,.. .,1). 

Proof.  xe'  =  x(Pe')  =  (xP)  e'  =  xAe'.  Therefore,  xe' 
must  te  zero  for  /\   i^    ^. 

Ive  are  also  interested  in  finding  the  relationship 
between  eigenvalues  cf  P  and  those  of  lumped  transition 
probability  matrix  T,  where  T  =  APB  as  described  above. 
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Theorem  5.  Supjcse  {X^}  with  transition  matrix  F  is 
lumpable  to  {X^}  with  transition  matrix  T.  The  eigenvalues 
of  P"  are  eigenvalues  cf  P. 

Proof.  Let  01(A)  =  0  be  the  (n"^  degree)  characteristic 
eguation  of  P.  By  the  Cayley  -  Hamilton  theorem  £Ref.  6], 

a(P)  =  QhP"*  ^mp""*  .  .  .  +  a,p  +  q,i  =  0  , 

which    together    with   eguation    (2.3)    implies 

Aa.(P)B    =      Qn?+     ^mP"'*    .    .     .    +     0.1 

=  a(P) 

=    0    . 

Since  P*  satisfies  P's  characteristic  eguation  and  since 
eigenvalues  of  (X,(P)  are  of  the  form  fl(A)/  it  follows  that 
Cl(A  )  =  0»  Thus  all  eigenvalues  j\  of  "?  are  also  eigenva- 
lues of  P. 

We  next  examine  the  eigenvectors  of  P  and  P,  with  the 
aim  cf  identifying  luEpings  of  £X^}  directly  in  terms  of  the 
eigenvectors  of  P.  We  have  seen  that  po"  is  obtained  directly 
as  pOE;  a  similar  relationship  holds  with  eigenvectors  cf  P. 

Theorem  6.  Suppose  x  is  a  left  eigenvector  of  P  corre- 
sponding to  eigenvalue  A/  and  suppose  {X.}  is  lumpatle  to  a 
chain  with  transition  matrix  T  =  APB.  Then  xB  satisfies  the 
eguaticn  (xE)'p'  =  (xB)A  . 

Proof.  By  equation  (2.1),  (xB)T  =  xBAPB  =  xFE.  Eut 
xP  =  xA  /  and  the  result  follows. 

We  note  that   xB  is  not  necessarily  an   eigenvector  of  P 

because  it   may  be  zero.   In   fact,   it  easily   follows  that 

xB  =  0  if  A  is  not  an  eigenvalue   of  'p'.   But  xB  may  be  null 

even  if  A  is  an  eigenvalue  of  P,  in  cases  of  where  A  is  a 
repeated  eigenvalue  cf  P  more  times  than  of  P*. 
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[Eef.  7]  pointed  out  some  other  useful  properties  asso- 
ciated with  eigenvalues  and  eigenvectors  such  as  :  1)  if  the 
matrix  P  is  symmetric  ,  then  eigenvalues  are  real  and  eigen- 
vectors are  different  for  repeated  eigenvalues,  2)  if  the 
matrix  is  not  symmetric,  then  the  eigenvectors  are  the  same 
for  repeated  eigenvalues. 

Theorem  7.  If  [X,}  with  transition  matrix  P  is  lumpatle 
to  {"X^}  viith  transition  matrix  T,  and  {X^)  is  lumpable  to 
{X^}  with  transition  matrix  f,  then  {X.}  is  directly 
lumpatle  to  {XT}  where  [XL]    is  the  lumped  chain  of  {xT} . 

Proof.    Let  {X  }   be   lumpable  to   [X^} #    and  {X^}   be 

lumpatle  tc  {X^}  by  matrices  Bj  and   Bj.,   where  B,  and  B^  are 

lumping   matrices  in   which   the  dimension   n  x  m   of  B,   is 

greater  than  that   of  B^.   By  eguation   (2.1),   P  =  A,PB, and 

T  =  Ai'pBa..  Thus, 

f  =A^PB^  =  A^{A,PE,)B^=  (A^A,)P(Bj3^) 

To  see  that  B,-Ba,  is  a  lumping  matrix  and  A^-A,  is  cf  the 
required  form,  we  need  to  show  that  (A^A,  )-(B  ,-B  j^)  is  the 
identity  matrix  as  mentioned  in  Section  A.  But 


{AiA,).(B-B^)  =  A^-{A,-B,)-B^  =  A,I-B^  =  A,-B, 


=  I 


Also,  note  that  E,-Bi  is  3,  lumped  by  B^. ,  so  E,-Bi  has  columns 
cf  the  required  form.  Therefore,  {X^}  is  directly  lumpable 
to  [X^}  ,  by  the  lumping  matrix  B  =  3,-B2,  . 
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Example    3.         Consider   a    Markov   chain    with   5   states,      and 
transition   probability  matrix 

rO.3      0.1       0.2      0.1       0.31 
0.1      0.3      0.1       0.3       0.2 
0         0.1       0.3 


P   = 


0.5      0.1 


0.1       0.5      0.2       0.1       0.1 


0.5 


0 


0.1       0.2       0.2i    . 


First,  consider  S  =  {1#2,3,4,5}  which  can  be  partitioned  to 
s"  =  {  {1}  ,  {2,4}  ,  {3,5}}  =  {L(1),L(2)  ,L  (3)}  .  The  corresponding 
lumping    matrices   are 


n  0  0- 

0    1    0 

I 

0    0    1 

0    1    0 

.0    0    1. 

and 


"1 

0 

0 

0 

0    " 

J 

0 

0.5 

0 

0.5 

0 

.0 

0 

0.5 

0 

0.5. 

/ 

and  the  lumped  transition  probability  matrix  is 


P  =  A,  PB,  = 


0.3   0.2   0.5"^ 
0.1   0.6   0.3 
L0.5   0.2   0.3J  . 


Secondly,      consider    S   with   3    states    which    can   be    partitioned 
to  ?   =    [{1,3},  {2}}    =    {L«  (1)  ,LM2)  },    with   matrices 


\    = 


■  1 

0- 

0 

1 

.  1 

0. 

and 


Aa   = 


0.5       0       0.5 
0         1         0 


]. 


The  corresponding  lumped  transition  matrix  is 


P  =  A^PB^  = 


0.8   0. 
0.4   0 


.6j  . 


Pinally,   consider  lumping  the  transition  probability  matrix 
directly.  For  partitioning. 
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S    =    {{1,3,5},  {2,4}}    =    {L"(1),L"(2)},    and 


-1    0" 

0     1 

1    0 

=    B,-3,= 

0     1 

J     0. 

1    0    0" 

0    1     0 

■10" 

0    0     1 

0    1 

0    1    0 

.1    0  - 

0    0     1. 

,    Aj^Aj  - 


1/3       0       1/3       0       1/3- 
,    0       1/2      0       1/2      0    J  , 


and    the    directly   lumped   transition   probability   matrix   is 

?  =  ro,8     0.2" 
.0.4    o.eJ  . 

Theorem  7  shows  that  lumping  is  "transitive",  in  the 
following  sense.  Define  two  transition  matrices  P  and  Q  to 
te  equivalent,  (P  -  Q)  #  if  Q  =  'p'  for  a  lumping  matrix  B 
whose  columns  are  these  of  the  identity  matrix,  in  seme 
permuted  order.  (Thus  the  chain  {X^}  and  {Y^}  differ  only  in 
the  labels  associated  with  their  states) .  Define  a  relation 
"  <  "  between  transition  matrices  as  follows:  Q  <  P  if  and 
only  if  Q  =  P"  for  seme  lumping  matrix  3.  Then  theorem  7 
shows  that  Q  <  P,  R  <  Q  =>  E  <  P.  This  relation  "  <  "  is 
reflexive,  since  Q  <  Q  using  the  lumping  matrix  I  (iden- 
tity) .  Finally,  "  <  "  is  antisymmetric  since  Q  <  P  and 
P  <  Q  =^  F  -  Q.  Thus,  the  set  if  all  transition  probability 
matrices  is  partially  ordered  by  the  "lumping"  partial 
order,  "  <  ". 
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III.  BOUNDS  CN  THE  LARGEST  EREQfi#_4_r  IN  P 


In  this  chapter  we  consider  three  procedures  to  find 
bounds  on  A .  First,  we  use  the  central  limit  thecrem  for 
given  i  and  j.  Secondly,  we  use  a  binomial  approximation  on 
the  basis  of  the  first  procedure.  Finally,  we  get  the 
largest  error  A,  using  the  asymptotic  extreme  value  distri- 
bution. These  three  approximations  are  only  designed  to  give 
a  rough  idea  of  the  relationships  between  A  and  the  number 
M  of  elements  in  the  state  space,  the  total  number  of 
observed  transitions  K,,  ,  and  the  probability  oC  . 

A.   APPECACH  aSING  CEHTRAL  LIMIT  THEOREM 

t!€  are  interested  in  the  sizes  of  the  errors  between  the 
estimate  P  and  the  unknown  P,  where  P  is  the  transition 
probability  matrix  cf  {X. ) .  We  assume  the  transition  prob- 
ability matrix  P  is  of  size  M  x  M. 

L€t  Kij  be  the  number  of  observed  transitions  of  (X^} 
from  state  i  to  state  j,  and  let  K-^.  be  the  number  of 
observed  transitions  from  state  i.  Similarly  K.j  is  the 
number  of  observed  transitions  into  state  j. 

Let  p-  be  an  unknown  transition  probability  from  state  i 
to  state  j  and  p-  be  an  estimate  of  p-  based  on  K.,  observed 
transitions.  Then  the  usual  estimate  p..  of  p-  is  the  ratio 
of  Kr  to  K^,  .  Now,  as  a  rough  approximation,  imagine  that 
K;,  is  fixed,  and  the  number  of  transitions  from  state  i  to 
state  j,  K--  ,  is  Binomial  (K|^.,p;.).  Then  by  the  central 
limit  theorem. 
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f..  13  approxamate  by  Normal[p..  ,  ^ ^; — j 


since 


<^j 


Efp..  "  =  Er-r-^^  ]  '^  ?■• 


,  and 


l^ii 


Var[f^.]  =  Var[— ^^] 


Var  I  ^'^j  1 


(3.1) 


We  want  to  find  a  bound  A  on  the  estimation  error  l^.--  p..  I 
which  occurs  with  probability  at  least  c>C  ;  that  is,  the 
largest  A  for  which 

Now 


PME-     Pyl      ^    A] 


r  r 

h  '  ?ij 

^I 

_  M.pi;(i-Py) 

V     CKc.)^ 

(3.2) 


let 


Z  = 


py  -  P^i 


,  then 
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Z  is    approximate  by    standard    Normal.      Rewrite  equation    (3.2) 
as 

E[|Z|  >  ]  >  oC   ;  0  <«<<  1 

^■.  — ■—  mm  • 

Equation  (3.2)  is  approximately 

P[Z  >    ^      >  -~- 

since  the  Normal  distribution  is  symmetric.   Solving  for  A, 
we  have 


^  <  N-M1  -  -T-'A/Py'^  -  Py* 


VkT 


where  N-i  (1  -  — - — )  is  the  (1  -  — - — )  quantile  of  the  stan- 
dard  Normal  distribution.  Suppose  the  steady  state  jrcb- 
ability  TT^  of  state  i  is  -tt-  based  on  the  equally  likely 
case,  and  suppose  the  worst  case  in  which 


Then  an  approximate  value  for  A  is  given  by 
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A      =    N<  (1    -   -^)      (0.5) 


=    N-i  (1    -   -^)      (0.5) 


=    N-i  (1 ^)     (0.5) 


y  K. 


YItTkV. 


V- 


=    IH  (1    -    -—-)     (0.5)J— S—    .  (3.3) 

Eguation    (3.3)       ccncerns    the    error      I  Pj  ~   Pi:  I     ^<^^   fixed    i 

J  J 

and    j.        We'd   now      like   to   find   an   error   bound     A      overall   i 
and    j.       That   is,    we    wish    to    find    the    largest  /\     for    which 

f[IPij~    Pyl    -   A    for    some    i   and   j]    >oCf 
which    is   roughly    the    same   as 


P[f-'~   Pi\   -    ^    ^^^   some    i   and    j]   >  (3.4) 

We  apply  the   binomial  approximation  in  equation   (3.4),   so 

that 

P[Ft"~    Fij  -    ^    f<^^    soae    i   and    j] 

=    1    -    P[  p-- -   p--    <  A     for    all    i    and    j] 


=    1    -     (1    -   -^K  (3.5) 

let    1-(1    -   )     =   B    fcr    some    0   <6<    1«       Solve   for    pC  ,       which 

gives 


=    2-2  Vl    - 


p  .  (3.6) 


26 


Substitute  the  value  cf  o<.  in  equation  (3.6)   into  e-^uation 
(3.3).  Finally  we  get  the  approximate  bound  A  for  all  i  and 

3- 


Equation  (3.7)  gives  an  approximate  expression  for  A»  using 
binomial  approximation. 

B.   APPBCACH  OSIMG  OBIER  STATISTICS 

Assume  Z,  ,  Z^,  ...  ,  Z|^  are  independent  ccntinucus 
random  variables,  each  with  density  function  f j,  (z)  and 
distribution  F^  (z)  .  Now  let  Z^,^  ,  Z^,^  ,  •••  /  Z^^j  denote 
their  ordered  values,  from  smallest  Z^,,  to  largest  Z(^)  ; 
these  are  called  the  order  statistics  of  Zj  ,  Z^,  ...  ,  Z^^  . 
We  now  consider  the  probability  law  for  Z.^^  [Eef.  8j,  the 
largest  or  maximum  value. 

The  event  [2.  <  z)  occurs  if  and  only  if  the  event 
{Z ,  <  z,  Z^<  z,  ...  ,  Zj^  <  z}  occurs,  since  if  the  largest 
Z  is  smaller  than  z,  all  M  of  the  random  variables  must  be 
smaller  than  z,  where  z  is  any  fixed  real  number.  The 
distribution  function  for  Z^.^.  is 

F   (z)  =  P[Z^^<  z] 

=  P[Z,  <  z,  Z^  <  z,  ...  ,  Z^  <  z] 

=  P£Z,  <  z]  P[Z^  <  zj  ...  P[Z^  <  z] 

since  Zj,Z^,...,Z^  are  assumed  independent.  But  each  of  Z|,Z^ 
g...,l^   has  the  same  distribution  F  (z)  .  So 

F^  (z)  =  [F  (z)  f^     . 
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The  density  function  for  Z 


f,  (z) 


where   f,(z)  =  — =— 
2      dz 


then  is 


Consider  the  liniting  distribution  function  of  the 
maximum  Z^^^  as  n  tends  to  infinity.  [Ref.  9,  10]  show  this 
distribution  is 


liniiF  (z)  i"^  =  e 


-  e 


->/Ic^  (  ^  -  ^J7q^  ) 


(3.8) 


if  Z, r  Z^, 


..  ,   Z^  is  a  random  sample  from  standard  Normal 
populaticn.    Tie   want   find  a  bound  ^  on  the   largest  of  M^ 


/Oi 


errors  between  estimates  in  ?  and  the  unknown  components  of 
P.  The  random  variables  p-.-  p.»  are  very  roughly  Normal  with 
mean  0  and  variance  -tt: —  which  is  derived  from  equation 
(3.1)  for  i/ j  =  1,2,...,M.  Recall  that  K..  is  the  total 
number  of  transitions  observed. 

Then  we  know  the 


Let  X^  =  f..  -  p-  where  1=  1,2/ ,!12. 


random  variable  X  is  approximately  equal  to 
2   has  a  standard  Normal  distribution, 
able  X,^i,  be  equal  to  maxl  £>-  p..].   Then 


M 


Z,  where 


2-  V  IC. 
2   has  a  standard  Normal  distribution.   Let  the  random  vari- 


lim  P[X  .  <  A  ] 


=  lim  Prmaxlp..  -  p.- I  <  z^  ] 

smAlltfst  of  'pij  -  p:j  ^  -  A 
I  arses'! 


=    P 


st   of    pij-pij  ^-A     , 
^"^    ^   h  -  Py  ^     A    J 
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Now    X^,j    and   X  ^  are    asymptotically   independent,    so    for    large 

=     {    P[X,    >-A  ]    ...     P[X^^>-A  ]    }-EF^{A)  f'^ 
=    [1    -   F^  (-A)  :'^'[Fy(  A)  z"^* 


=    [Fj,(^)  f'^    .  (3.9) 

From   equation  (3.9)    we   derive  an  expression   for  a,  as 

follows.   let  ^    be  the  largest  value  for  which 

PU<^,^    A]    <     1    -    ^  . 


This    is      the  complementary      probability   because      we    wish  to 

have    r[  I  f"-."   Vvl    -    ^      fo^   some   i      and    j  ]    >    o<!!       ,       as      in  the 

previous  section.      The  limiting   distribution  function   of  the 
maximum    X.^i.is    the   sane   as 


lin>[F^    (A)  ]        =    lim[F       (2AV^)  ] 


Then,    approximately. 


-Vr^^^(i^V^  -V^v^^ 


and 
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log  {-leg  (1    -  oi    )]    -  -A/21og2M2     (2  A^-^^    -A/21og2M2)  . 


Finally, 


Equation  (3.10)  is  an  approximate  expression  for  A  tased  on 
the  asymptotic  distritution  of  the  extreme  order  statistic, 
^'e  will  compare  the  central  limit  theorem  A's  with  those 
obtained  with  the  extreme  value  distribution,  in  the  next 
section. 

C.   COMPABISOH  OF  THE  THREE  EXPRESSIONS 

The  three  expressions  for  A  obtained  using  the  central 
limit  theorem  and  order  statistics  have  been  developed  under 
approximations  such  as:  1)  the  steady  state  distribution 
of  {x^}  is  — pp-  (equally  likely)  ,  2)  the  variances  of 
1^.-  p..  I  have  7  J  ■  as  a  maximum  value  (worst  case),  and  3) 
all  transitions  are  independent.  Information  about  {X^}  is 
from  the  estimate  ^  because  we  don't  have  information  about 
the  unknown  P.  In  a  view  of  the  above  approximations  and 
computations,  our  expressions  for  A  are  very  rough.  However 
they  do  provide  some  insight  into  the  occurrences  and  sizes 
of  estimation  errors  in  P. 

Figure  3.1  contains  3  graphs  showing  A  as  a  function  of 
K,,  and  M  for  fixed  itC  =  0.90  based  on  the  three  expressions 
(3.3)  ,  (3.7)  and  (3.  10)  . 

The  first  graph  shows  error  bounds  using  the  central  limit 
theorem  en  p.,-  p-  for  fixed  i  and  j.  The  second  graph  is 
given  by  the  same  approach  as  the  first  graph,  except 
overall  estimation  errors  are  considered,    for  all  i  and  j. 
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ALPHA  =   0.90 
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Figure  3.1    Vatiatioa  of  A  for  Tarying  K.,  and  a. 
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The  third  graph   is  hased  on  the   asymptotic  distribution  of 
the  largest  value  of  ip^»-  P;- 1  over  all  i  and  j. 

From  Figure  3.1  we  see  that  the  largest  estimation  error 
depends  very  much  on  the  number  of  transition  observations 
and  matrix  size,  but  not  so  iruch  on  the  ©{  value  as  seen 
from  Figure  3.2  . 
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Figure  3.2    Variation  of  Zl  by  Changing  ilpha  (o<^) 
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Graphs  2  and  3  in  Figure  3.1  are  very  similar  even  though 
they  use  different  approaches.  They  give  an  idea  of  how 
large  likely  values  cf  A  ^re  for  given  K.,  and  n,  in  the 
"worst  case". 

If  we  consider  a  Markov  chain  (X^}  with  M  =  20  or  30 
states,  and  we  have  observed  K  =  5000  transitions  then, 
roughly,  it  is  likely  (prob  =  0.90)  that  at  least  one 
element  of  P  is  in  error  by  at  least  0.1.  In  general, 
expressions  (3.7)  and  (3.10)  may  be  useful  for  Markov  chains 
{Xl}  with  M  =  20  or  30  states  and  large  numbers  of  observed 
transiticns- 

D.   SENSITIVITY  OF  LDIPING  CONDITIONS 

We  have  developed  expressions  for  A ,  using  the  central 
limit  theorem  and  order  statistics.  He  want  to  examine  the 
sensitivity  of  the  lumping  conditions  applied  to  P,  the 
estimate  of  P.  If  equation  (2.1) ,  which  is  a  necessary 
condition  for  lumping  a  Markov  chain  with  transition  matrix 
P,  is  satisfied,  then  the  lumped  transition  matrix  "?  is 
given  by  equation  (2.2).  However,  even  though  P  satisfies 
the  lumping  conditions,  it  is  extremely  unlikely  that  its 
estimate  "^  will  also  satisfy  these  conditions,  as  we  shall 
now  demonstrate. 

In  order  to  simulate  the  difference  between  P  and  P, 
consider  a  matrix  of  errors  R-A,  where  R  is  a  random  matrix 
with  dimension  the  same  as  P,  whose  components  are  1 's, 
-1»s,  and  0*s  where  the  sum  of  each  row  is  zero.  Now 
consider  the  lumpability  of  the  simulated  estimate  P^^,  which 
is  constructed  by  taking  P  plus  the  random  matrix  R  times 
A,  that  is,  p"^  =  p  ♦  R-A. 

To  show  the  sensitivity  of  the  lumping  conditions,  we 
assume  the  unknown  P  is  lumpable  with  lumping  matrix  B,  and 
consider  the  difference  (BAP'^B  -  P^B)  .  If  equation  (2.1)  is 
satisfied  by  P  then  all  of  these  components  must  be  zero. 
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Theorem  8.  The  difference  (BAP^B  -  P^B)  is  a  linear 
function  of  A. 

Proof.  Let  R  be  the  random  matrix  as  defined  ato7€  and 
let  P*  =  P  +  R-Zi.   Then  (BAP^B  -  P^B)  is  given  by 

£BA(P  +  E-A)B  -  (P*R-A)E)  =  (BAPB  +  BARA-3  -  PB  -  R-AE) 

=  (SAPB  -  PB  ■»•  (BARB  -  RB)A  ) 

=  (BARB  -  RB)-A 

=  cA. 

Therefore  the  difference  of  EAP^'B  -  P^B  is  linearly  depen- 
dent en  A  and  P"^  is  not  lumpatle  unless  BARB  =  RB  (i.e.,  R 
is  "lumpable") ,  which  is  not  likely  to  occur. 

Since  P  is  likely  to  have  elements  differing  appreciably 
from  the  corresponding  elements  in  P  (errors  of  size  A),  it 
can  t€  seen  that  the  lampability  conditions  will  not  be 
satisfied  (not  even  nearly  so)  by  P,  even  though  {X^}  is 
lumpatle.  We  conclude  that  attempting  to  check  the  lumpa- 
tility  of  the  estimate  P  when  P  is  not  known  is  not  useful. 
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IV.  SDMMARI  AND  CONCLOSIONS 


W€  have  given  several  theorems  associated  with  eigenva- 
lues and  eigenvectors  for  lumpable  Markov  chains  {X.}  with 
finite  state  spaces.  Fe  have  derived  rough,  approximate 
mathematical  expressions  for  the  largest  error  made  in  esti- 
mating P  by  P  based  en  transition  data. 

Both  expressions  (3.7)  and  (3.10)  are  very  similar  even 
though  the  estimated  A  's  for  the  first  expression  are 
slightly  less  than  those  in  the  second  expression.  These 
expressions  show  that  the  largest  estimation  errors  depend 
very  much  on  the  number  of  transition  observations  ard  on 
the  matrix  size,  but  tot  so  much  on  the  o<  value. 

Since  P  is  likely  to  have  elements  differing  appreciably 
from  the  corresponding  elements  in  P,  it  is  of  interest  to 
examine  whether  the  equation  BAPB  =  PB  is  likely  tc  be 
nearly  satisfied  with  P,  i.e.,  will  (BAPB  -  ?B)  be  nearly 
zero?  This  is  examined  by  simulation  of  "estimates"  p*  of 
F,  using  random  perturbations  of  elements  ox  P  of  sizes  ^ 
which  are  likely  to  occur  as  errors  in  P. 

This  shows  that  the  classical  lumping  conditions  are 
extremely  sensitive  to  estimation  errors  which  can  be 
expected  to  occur  even  when  a  large  number  of  transitions 
have  been  observed.  Thus,  the  classical  lumping  conditions 
may  be  of  limited  value  in  many  actual  applications. 

As  further  research,  it  is  recommended  that  seme 
constructive  approach  to  finding  matrices  3  for  lumping  a 
lumpable  Markov  chain  {X^}  be  developed  ,  perhaps  along  the 
lines  of  the  theorems  mentioned  in  Chapter  2.  It  is  hoped 
that  the  present  study  will  be  useful  to  those  who  might 
otherwise  have  endeavored  to  check  the  classical  condition 
for  lumpability  of  a  Markov  chain  {X^}  when  the  transition 
matrix  P  has  been  estimated. 
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